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DUPLICATE 

P/9760GB 1 

APPARATUS FOR DETECTING AND RECOVERING DATA 
Field of Invention 

The present invention relates to methods and apparatus for detecting and 
recovering data embedded in information material. 

Information material as used herein refers to and includes one or more of video 
material, audio material and data material. Video material in this context may be still 
images or moving images. 
Background of Invention 

Steganography is a technical field relating to the embedding of data into 
material such as video material, audio material and data material in such a way that the 
data is imperceptible in the material. 

Watermarks are data embedded in material such as video material, audio 
material and data material. A watermark may be imperceptible or perceptible in the 
material. 

A watermark may be used for various purposes. It is known to use watermarks 
for the purpose of protecting the material against, or trace, infringement of the 
intellectual property rights of the owner(s) of the material. For example a watermark 
may identify the owner of the material. 

Watermarks may be "robust" in that they are difficult to remove from the 
material. Robust watermarks are useful to trace the provenance of material which is 
processed in some way either in an attempt to remove the mark or to effect legitimate 
processing such as video editing or compression for storage and/or transmission. 
Watermarks may be "fragile" in that they are easily damaged by processing which is 
useful to detect attempts to remove the mark or process the material. 

Visible watermarks are useful to allow, for example, a customer to view an 
image via, for example, the Internet to determine whether they wish to buy it but 
without allowing the customer access to the unmarked image they would buy. The 
watermark degrades the image and the mark is preferably not removable by the 
customer. Visible watermarks are also used to determine the provenance of the 
material into which they are embedded. 



P/9760GB 



2 



In US patent 5,930,369 (Cox et al), it has been proposed to embed data into 
material such as images to form a watermark by converting the material into the 
transform domain and adding the data to the image in the transform domain. For the 
example of images and the Discrete Wavelet Transform of these images into the 
transform domain, the data to be added can be combined with the wavelet coefficients 
of one of a plurality of sub-bands which are formed in the transform domain. 
Generally, the data to be embedded is arranged to modulate a predetermined data 
sequence such as a Pseudo Random Bit Sequence (PRBS). For example, each bit of 
the data to be embedded is arranged to modulate a copy of the PRBS, and this copy is 
then added, for example into one of the sub-bands of the image in the transform 
domain. The image is then converted back to the spatial domain. 

If it is desired to detect and recover the embedded data from the image, the 
image is converted back to the transform domain and the embedded data is recovered 
from the sub-band in the transform domain by cross-correlating the transform 
coefficients in the sub-band with the Pseudo Random Bit Sequence which is known to 
the detecting apparatus. 

Generally it is desirable to reduce to a minimum any perceivable effect that the 
embedded data may have on the information material such as images. However it is 
also desirable to increase the likelihood of correctly recovering the embedded data 
from the information material, in spite of errors which may introduced as a result of 
any processing which may be performed on the material. 
Summary of Invention 

According to the present invention there is provided an apparatus for detecting 
and recovering data embedded in information material, the data comprising a plurality 
of source data items each having been encoded in accordance with a systematic error 
correction code to produce encoded data items each comprising the corresponding 
source data item and redundant data, the encoded data items being embedded in the 
information material, the apparatus comprising an embedded data detector operable to 
detect and generate a recovered version of the encoded data from the information 
material, an error processor operable, for each of the recovered encoded data items, to 
determine whether the recovered encoded data item is deemed too errored, and if not, 
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decoding the encoded data item to generate a recovered version of the data item, a data 
store for storing the recovered version of the data item, and a recovery data processor 
operable, if the error processor determines that one of the recovered encoded data 
items is deemed too errored, to compare the source data item of the encoded data item. 
5 with at least one other source data item from the data store, and to. estimate the source 
data item of the errored encoded data item in dependence upon a corresponding value 
of the at least one other recovered data item. 

The term systematic code is a term used in the technical field of error 
correction coding to refer to an error correction code or encoding process in which the 

10 original source data appears as part of the encoded data in combination with redundant 
data added by the encoding process. For a non-systematic encoding process, the input 
data does not appear as part of the encoded data. 

Embodiments of the present invention address a technical . problem of 
recovering data embedded in information material, when errors have been introduced 

15 into the data as a result for example of the embedding process, or as a result of 
processes performed on the information material in which the data is embedded. For 
the example of information material such as images, data is embedded into the images 
so that the effect of the embedded data is difficult to perceive and is as far as possible 
imperceptible. As such, the strength with which the data is embedded is reduced to a 

20 minimum level which is still sufficient to ensure that the data can be recovered from 
the image. However, if the image is processed in some way, such as if the image is 
compression encoded, errors may be introduced into the embedded data as a result of 
the compression encoding process. This is because, typically an effect of compression 
encoding is to alter or discard components of the image. These components may be 

25 bearing the embedded data. In addition, inaccuracies as a result for example of 
quantization errors may be present when detecting and recovering the embedded data, 
further contributing to errors in the recovered data. . 

It is known to protect against errors by encoding data using an error correction 
code. Error correction encoded data typically can be used to correct a certain number 

30 of errors in each encoded data word and typically to detect a greater number of errors 
in each encoded data word. 
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Embodiments of the present invention utilise systematic codes in which the 
data items to be embedded appear as part of the error correction encoded form of the 
data items. Typically, the data items to' be embedded may be correlated in some way. 
As such, if an encoded data item is deemed too errored and therefore not recoverable 
5 by error correction decoding, the data item may be recovered from the errored encoded 
data item, by comparing the data item with at least one other recovered data item, and 
estimating the data item in accordance with a correlation between the recovered data 
items. 

A recovered encoded data item may be deemed too errored, if the error 
10 processor is operable to determine the number of errors in the encoded data item, and 
to compare the number of errors with a predetermined threshold. If the number of 
errors is greater than the threshold, then the encoded data item is deemed to have too 
many errors, which are safe to decode. Alternatively, the error correction decoding " 
process performed by the error processor may provide an indication, as part of this 
1 5 process that the encoded data item r cannot be recovered because there are too many 
errors. 

Although the errored data item may be recovered from one. other recovered 
data item, in preferred embodiments, the errored data item may be recovered by 
comparing the errored data item with a previous and a subsequent recovered data item. 

20 The errored data item may be recovered by interpolating between the previous and 
subsequent data items, if these data items are different, or replacing the value of the 
errored data item to be recovered with the values of the previous and subsequent data 
items if they are the same. 

Although embodiments of the present invention find application in recovering 

25 any data items which have been error correction encoded and embedded into 

information material, in preferred embodiments, the data items have a plurality of data 
fields. Accordingly, in preferred embodiments the recovery processor may be 
operable to compare at least one of the data fields, for an errored encoded data item 
which cannot be decoded, with the corresponding field of at least one of a previous 

30 and a subsequent data item, and to replace the data field of the errored encoded data 
item in accordance with the corresponding data field of one of the previous and 
subsequent data items. The data field may be replaced with the value of the 
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corresponding data field of one of the previous and the subsequent data items, if the 
data fields of the previous and subsequent data items are the same, or may be replaced 
by a value determined by interpolating between the value of the corresponding data 
field of the previous and subsequent data items. 

After replacing the data field of the encoded data item which is deemed to be 
too errored, the recovery processor is operable to determine in combination with the 
error processor whether the recovered encoded data item in which the data field is 
replaced is deemed to be too errored, and if not, decoding the encoded data item to 
form a recovered version of the data item. It is likely that the errors in the encoded 
data item. will be distributed throughout this data item, so that by replacing the data 
field with a value determined from the already decoded data items, at least some of 
these errors may have been corrected. As such, the number of errors in the adapted 
encoded data item may now be a number, which can be corrected by error correction 
decoding. The error processor is therefore arranged to that effect of re-applying error 
correction to the effect of recovering the entire data item, if the encoded data item is 
deemed to be recoverable. 

In preferred embodiments the data items may be meta data representative of the 
content of the information material. For example the data items may be Universal 
Material Identifiers (UMIDs), the data fields being the data fields of the UMIDs. 

Although embodiments of the invention find application in detecting and 
recovering data from any information material, a particular application of the invention 
is in detecting and recovering data embedded in video image or audio signals. 

Various further aspects and features of the present invention are defined in the 
appended claims. 

Brief Description of the Drawings 

Figure 1 is a schematic block diagram of a watermarking system; 
Figure 2 is a schematic block diagram of a watermark embedder appearing in 
Figure 1; 

Figure 3 is a schematic representation of a UMID encoded by the error 
correction encoder shown in Figure 2, using a systematic error correction code; 
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Figure 4 is a schematic block diagram of a combiner forming part of the 
watermark embedder shown in Figure 2; 

Figure 5 provides an illustrative representation of a transform domain image 
with which data is combined; 
5 Figure 6 is a schematic block diagram of a watermark decoder appearing in 

Figure 1; 

Figure 7 is a schematic block diagram of an error correction decoder according 
to an embodiment of the present invention; 

Figure 8 provides an illustrative representation of a process of recovering data 
10 items, performed by a recovery processor forming part of the error correction decoder 
shown in Figure 7; 

Figures 9 A and 9B are schematic block diagrams of the structure of an 
extended and a basic UMID respectively. 
Description of Preferred Embodiments 

15 An example embodiment of the present invention will be described with 

reference to a watermarking system in which data is embedded into a video image. 
Any type of data can be embedded into the image. However, advantageously the data 
embedded into the image may be meta data which describes the image or identifies 
some attributes of the content of the image itself. An example of meta data is the 

20 Universal Material Identifier (UMID). A proposed structure for the UMID is disclosed 
in SMPTE Journal March 2000. A more detailed explanation of the structure of the 
UMID will be described later. 
Watermarking System 

Figure 1 illustrates a watermarking system, generally 10, for embedding a 
25 watermark into a video image 1 1 5, and recovering and removing a watermark from the 
video image 115. The watermarking system 10 in figure 1 comprises an image 
processor 100 for embedding the watermark into the video image, and a decoding 
image processor 102 for detecting and recovering the watermark, and for removing or 
'washing' the watermark from the video image. 
30 The image processor 100 for embedding the watermark into the video image 

comprises a strength adapter 180, and a watermark embedder 120. The watermark 
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embedder 120 is arranged to embed the watermark into the video image 1 15, produced 
from the source 110, to form a watermarked image 125. The watermark to be 
embedded into the video image is formed from data 175 representing a UMID. 
Generally, the UMID identifies the content of the video image, although it will be 
appreciated that other types of meta data which identify the content or other attributes 
of the image can be used to form the watermark. In preferred embodiments the 
watermark embedder 120 embeds the UMID into the video image 115 in accordance 
with a particular application strength 185 from the strength adapter 180. The strength 
adapter 1 80 determines the magnitude of the watermark in relation to the video image 
115, the application strength being determined such that the watermark may be 
recovered whilst minimising any effects which may be perceivable to a viewer of the 
watermarked image 125, After embedding the watermark, the image may be 
transmitted, stored or further processed in some way, such as for example, 
compression encoding the image. This subsequent processing and transmitting is 
represented generally in Figure 1 as line 122. 

In Figure 1 the decoding image processor 102 for detecting and removing the 
watermark is shown as comprising a watermark decoder 140, a data store 150 and a 
watermark washer 130 which removes the watermark from the watermarked image 
125. 

The watermark decoder 140 detects the watermark from the watermarked video 
image and in the present example embodiment, generates a restored UMID 145 from 
the watermarked image 125. The watermark washer 130 generates a restored image 
135, by removing as far as possible, the watermark from the watermarked image 125. 
In some embodiments, the watermark washer 130 is operable to remove the watermark 
from the image substantially without leaving a trace. The restored image 125 may 
then be stored in a store 150, transmitted or routed for further processing. 
The Watermark Embedder 

The watermark embedder will now be described in more detail with reference 
to Figure 2, where parts also appearing in Figure 1 have the same numerical 
references. In Figure 2*the watermark embedder 120 comprises a pseudo-random 
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sequence generator 220, an error correction encoder 200, a wavelet transformer 210, 
an inverse wavelet transformer 250, a modulator 230 and a combiner 240. 

The error correction encoder 200 receives the UMID 175 and generates an 
error correction encoded UMID comprising redundant data in combination with the 
UMID, in accordance with an error correction encoding scheme. It will be appreciated 
that various error correction coding schemes could be used to encode the UMID. 
However, in accordance with an embodiment of the present invention, the error 
correction code which is used by the error correction encoder 200 to encode the UMID 
is a systematic code. For the example embodiment the systematic code is a Bose- 
Chaudhuri-Hocquenghem (BCH) code providing 511 bit codewords comprising- 248 
source bits of the UMID and 263 bits of redundant parity bits. This is represented in 
Figure 3 where the UMID is illustrated as having only three data fields although as 
will be explained shortly these are just an example of three of the data fields which 
appear in the UMID. These data fields Dl, D2 ? D3 will be used to illustrate the 
example embodiment of the present invention. 

It will be appreciated that the present invention is not limited to any particular 
error correction encoding scheme, so that other BCH codes, or for example Reed- 
Solomon codes or convolutional codes could be used to encode the UMIDs. However, 
the encoding scheme should be arranged to encode the data items (UMIDs) in 
accordance with systematic codes, wherein the source data appears with redundant 
data added by the encoding scheme in the encoded form. 

As shown in Figure 2 the error correction encoded UMID 205 is received at a 
first input to the modulator 230. The pseudo-random sequence generator 220 outputs a 
PRBS 225 which is received at a second input to the modulator 230. The modulator 
230 is operable to modulate each copy of a PRBS, generated by the pseudo-random 
sequence generator 220, with each bit of the error correction encoded UMID. The 
encoded UMID is therefore arranged to modulate the PRBS to form a spread spectrum 
encoded data signal. In preferred embodiments the data is modulated by representing 
the values of each bit of the PRBS in bipolar form (T as +1, and '0' as -1) and then 
reversing the polarity of each bit of the PRBS, if the corresponding bit of the encoded 
UMID is a c 0' and not reversing the polarity if the corresponding bit is a T. The 
modulated PRBS is then received at a first input of the combiner 240. The combiner 
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240 receives at a second input the image in which the PRBS modulated data is to be 
embedded. However the data is combined with the image in the transform domain. 

The use of a pseudo-random bit sequence (PRBS) 225 to generate the spread 
spectrum signal representing the watermark data allows a reduction to be made in the 
strength of the data to be embedded in the image. By cross-correlating the data in the 
transform domain image to which the modulated PRBS has been added, a correlation 
output signal is produced with a so called correlation coding gain which allows the 
modulated data bit to be detected and determined. As such, the strength of the data 
added to the image can be reduced, thereby reducing any perceivable effect on the 
spatial domain image. The use of a spread spectrum signal also provides an inherent 
improvement in robustness of the image because the data is spread across a larger 
number of transform domain data symbols. 

As shown in Figure 2, the wavelet transformer 210 receives the video image 
1 15 from the source 110 and outputs a wavelet image 215 to the combiner 240. The 
image is thus converted from the spatial to the transform domain. The combiner 240 is 
operable to add the PRBS modulated data to the image in the transform domain, in 
accordance with the application strength, provided by the strength adapter 180. The 
watermarked wavelet image 245 is then transformed into the spatial domain by the 
inverse wavelet transformer 250 to produce the watermarked image 125. The 
operation of the combiner 240 will be explained in more detail shortly. 

The skilled person will be acquainted with the wavelet transform and variants. 
A more detailed description of the wavelet transform is provided in for example "A 
Really Friendly Guide to Wavelets" by C Valens, 1999 (c.valens@mindless.com y 

Although in the example embodiment of the present invention the data is 
embedded in the image in the wavelet transform domain, it will be appreciated that the 
present invention is not limited to the wavelet transform and could be added to the 
image using any transform such the Discrete Cosine Transform or the Fourier 
Transform. Furthermore the data could be combined with the image in the spatial 
domain without forming a transform of the image. 
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Combiner 

The operation of the combiner 240 will now be explained in more detail. The 
combiner 240 receives the wavelet image 215 from the wavelet transformer 210. and 
the modulated PRBS from the modulator 230 and the application strength 185 from the 
strength adapter 180. The combiner 240 embeds the watermark 235 onto the wavelet 
image 215, by adding, for each bit of the modulated PRBS a factor a scaled by ±1, in 
dependence upon the value of the bit. Selected parts of the wavelet image 215 are 
used to embed the watermark 235. Each pixel of the predetermined region of the 
wavelet image 215 is encoded according to the following equation: 

X^X^aJV, (1) 
Where A7 is the i-th wavelet coefficient, a n is the strength for the w-th PRBS 
and W\ is the i-th bit of the PRBS to be embedded in bipolar form. 

An example of the combiner is shown in Figure 4. In Figure 4 the combiner 
240 is shown to receive the transform domain image from the connecting channel 215 
which provides the transform domain image to a frame store 236. The frame store 236 
is arranged to store a frame of transform domain data. The combiner 240 is also 
arranged to receive the spread spectrum encoded and error correction encoded UMID 
after it has been spread using the PRBS (modulated PRBS data). Far this example 
embodiment one UMID in this error correction and spread spectrum encoded form is 
to be embedded in the frame of image data within the frame store 236. Thus, each 
encoded UMID forms an item of data which is to be embedded into each frame of 
image data. To this end, the frame store stores a frame of data representing the image 
in the wavelet transform domain. The data to be embedded is received at a combining 
processor 237 which combines the data to be embedded into selected parts of the 
wavelet transform domain image stored in the frame store 236. The combiner 240 is 
also provided with a control processor 238 which is coupled to the combining 
processor 237. 

In Figure 5 an illustrative representation of a first order wavelet transform is 
presented. This wavelet transform is representative of a frame of the image 
transformed into the wavelet domain and stored in the frame store 236. The wavelet 
transform image WT_IMG is shown to comprise four wavelet domains representative 
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of sub-bands into which the image has been divided. The wavelets comprise a low 
horizontal, low vertical frequencies sub-band IHilVj, the high horizontal, low vertical 
frequencies sub-band hHilVj, the low horizontal, high vertical frequencies sub-band 
lHjhVi and the high horizontal, high vertical frequencies sub-band hHihV}. 

In the example embodiment of the present invention, the data to be embedded 
is only written into the low vertical, high horizontal frequencies sub-band hHjlVi and 
the low horizontal, high vertical frequencies sub-bands labelled hHilVj. 

By embedding the data in only the two sub-bands hHilVj, lH^hVi, the 
likelihood of detecting the embedded data is improved whilst the effects that the 
embedded data will have on the resulting image are reduced. This is because the 
wavelet coefficients of the high horizontal, high vertical frequencies sub-bands 
hHjhVi are more likely to disturbed, by for example compression encoding. 
Compression encoding processes such as JPEG (Joint Photographic Experts Group) 
operate to compression encode images by reducing the high frequency components of 
the image. Therefore, writing the data into this sub-band hHihVj would reduce the 
likelihood of being able to recover the embedded data. Conversely, data is also not 
written into the low vertical, low horizontal frequencies sub-band lHilVj. This is 
because the human eye is more sensitive to the low frequency components of the 
image. Therefore, writing the data in the low vertical, low horizontal frequencies sub- 
band would have a more disturbing effect on the image. As a compromise the data is 
added into the high horizontal, low vertical frequencies sub-band hHilVi and the low 

horizontal, high vertical frequencies sub-bands lHihVj. 
Decoder 

The operation of the watermark decoder 140 in the decoding image processor, 
will now be explained in more detail, with reference to Figure 6, where parts also 
appearing in Figure 1, bear identical reference -numerals. The watermark decoder 140 
receives the watermarked image 125 and outputs a restored version of the UMID 145. 
The watermark decoder 140 comprises a wavelet transformer 310, a pseudo-random 
sequence generator 320, a correlator 330, and an error correction decoder 350. 
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Optionally in alternative embodiments an analysis processor 360 may be provided as 
will be explained shortly. 

The wavelet transformer 310 converts the watermarked image 125 into the 
transform domain so that the watermark data can be recovered. The wavelet 
coefficients to which the PRBS modulated data were added by the combiner 240 are 
then read from the two wavelet sub-bands hHilVj, lHjhVi in the reverse direction to 
the direction in which the data was added in the combiner 240. These wavelet 
coefficients are then correlated with respect to the corresponding PRBS used in the 
watermark embedder. Generally, this correlation is expressed as equation (2), below, 
where X n is the w-th wavelet coefficient and Rj is the /-th bit of the PRBS generated by 

the Pseudo Random Sequence Generator 320. 

The relative sign of the result of the correlation C n gives an indication of the 
value of the bit of the embed data in correspondence with the sign used to represent 
this bit in the watermark embedder. The data bits recovered in this way represent the 
error correction encoded UMID which is subsequently decoded by the error correction 
decoder 350 using' a decoding algorithm for the error correction code used by the 
encoder 200. Having recovered the UMID, the watermark can be removed from the 
video image by the watermark washer 130, by performing the reverse of the operations 
performed by the embedder. 

Figure 7 provides a more detailed block diagram of the error correction 
decoder 350 in accordance with an example embodiment of the present invention. In 
Figure 7 the error correction encoded UMIDs which have been recovered from the 
video images by the correlator 335 are received from the connecting channel 345 by an 
error processor 400. The encoded UMIDs are also received from the connecting 
channel 345 by a recovery processor 404. 

The error processor 400 operates to perform an error detection and/or 
correction process in order to attempt to recover the UMID. 

It is known that an error correction code can correct a certain number of errors 
and detect a certain number of errors, the number of errors which can be detected 
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being generally greater than the number that can be corrected. Thus in general the 
ability of an error correction code to detect errors is greater than the ability of a 
decoder to correct these errors. 

The number of errors in the recovered encoded UMID may be too great for the 
UMID to be recovered from error correction decoding. However the error processor 
has a facility for detecting whether error correction is possible. For the example BCH 
code, a decoding process for the BCH code can provide an indication that error 
correction is not possible, when error correction is applied. 

Alternatively, the error processor 400, may first perform an error detection 
process, in order to detect the number of errors in the encoded UMID. If the number 
of errors is greater than a predetermined threshold the encoded UMID is determined to 
be unrecoverable because the encoded UMID has too many errors. The predetermined 
threshold is set to reduce the likelihood of falsely decoding an encoded UMID as a 
result of the number of errors being greater than the number which can be decoded by 
the error correction code. Typically, the threshold may be set in accordance with a 
compromise between a number of errors which can be corrected by error correction 
decoding, and reducing the likelihood of a number of errors being so large as to be 
incorrectly decoded. 

If an encoded UMID is declared as being too errored, a control processor 406 
controls the recovery processor 404 via a control channel 408 to perform a recovery 
process on the errored UMID to attempt to recover the UMID. 

If the error processor 400 determines that the number of errors present in the 
encoded UMID is recoverable using error correction decoding, then the control 
processor 406 controls the error processor 400 to decode the encoded UMID to 
provide a recovered version of the UMID which is output from the output channel 145. 
This recovered UMID however is also stored under control of the control processor 
406 within a data store 410. 

Returning to the operation of the recovery processor 404, the embodiment of 
the present invention shown in Figure 7 utilises the nature of the systematic error 
correction code used to encode the UMID. As shown in Figure 3 the UMID as source 
data appears as part of the encoded code word. As such, the recovery processor has 
access to the UMID albeit in a form in which there are errors present. The recovery 
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processor may store the errored UMID locally until a subsequent UMID has been 
decoded and recovered and stored in the data store 410. Accordingly, the recovery 
processor may then compare the data fields of the UMID for one or both of a previous 
successfully decoded UMID and a subsequent successfully decoded UMID. This is 
represented in Figure 8 where UMID-1 and UMID+1 are representative of successfully 
recovered previous and subsequent UMIDs which are shown with the present UMID 0. 
As represented by hashed sections, the present UMID 0 is deemed not recoverable. 

In order to recover the data from the UMID, the recovery processor 404 
compares the values in the data field of the UMID. The UMID may include a clip ID 
which identifies the clip or take of the video images (clip ID) in which the UMIDs 
have been embedded. Accordingly, for example the first data field Dl may represent 
the clip ID. If the clip ID Dl is the same in UMID-1 and UMID+1, then it is likely 
that data field Dl of the UMID 0 which can not be recovered should be the same as the 
clip ID in UMID-1 and UMID+1. Therefore the recovery processor compares data 
field Dl of UMID-1 and UMID+1 and if these are the same it sets data field Dl of the 
UMIDO to this value. 

At this point, the control processor may then attempt to decode the adapted 
encoded UMID. This is because if the number of errors still present in the encoded 
UMID after the first data field Dl has been replaced, is less than a number which can 
be correctly decoded, then the error correction code can be used to recover the UMID. 
If this number is less than the predetermined threshold, the adapted encoded UMID in 
which the data field Dl has been replaced is fed to the error processor 400. The error 
processor as before determines whether the encoded UMID can be recovered by error 
correction decoding, thereby recovering the rest of the UMID including the second and 
third data fields D2, D3 . 

Alternatively, or if the adapted encoded UMID still cannot be decoded, after 
having replaced the first data field Dl, the second and third data fields D2 and D3 may 
be compared by the recovery processor 404 for the previous and subsequent UMIDs. 
The data field D2 may be for example a time code. As such, and because each UMID 
has been embedded in successive frames of the video images, it can be expected that 
the data in the second data field D2 will linearly increase between successive UMIDs. 
Therefore the recovery processor compares the second data fields D2 and if these are 
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different by less than a predetermined threshold then the data field D2 of the errored 
encoded UMID 0 is calculated by interpolating between the data values in the second 
data fields D2 of the previous and subsequent UMIDs. Accordingly, after replacing 
the second data field D2, the control processor 406 may then feed the recovered 
encoded UMID with the redundant data to the error processor 400 and attempt to 
decode the UMID once again, or simply output the UMID as recovered. 

If when comparing the second data fields D2, the data fields of the previous 
and subsequent UMIDs differ by an amount greater than the predetermined threshold, 
then it may be assumed that the UMID recovered from the previous or the subsequent 
image frame relates to a separate video clip. As such the data field or indeed the 
UMID as a whole is either replaced by the corresponding field from the earlier or 
subsequent recovered UMIDs. In order to determine whether the previous or the 
subsequent UMID corresponds to the same video clip, as the UMID 0 being recovered, 
the content of the video images of the previous and subsequent frames are compared, 
with the image from which the UMID 0 is being recovered as will be explained 
shortly. Accordingly, the second data field D2 of the UMID to be recovered is set to 
the same value as the second data field D2 of the previous UMID or subsequent 
UMID, decoding may then be re-applied. As a default, the errored UMID can be 
replaced with the previous UMID. 

It will be appreciated that if there exists a correlation between the values of the 
third data fields D3 of the recovered UMIDs, then this correlation can be used to 
estimate the value of the third data field D3 of the errored encoded UMID. 

After all the data fields of the UMID have been estimated by interpolation or 
replacement, error correction decoding is again attempted on the encoded UMID. If 
the encoded UMID is correctable, then the encoded UMID is decoded and output as 
the recovered UMID. If however the UMID is still not correctable, then the UMID in 
the adapted form after interpolation is assumed to be correct and output as the 
recovered UMID. This is because the UMID may have been recovered correctly by 
interpolation, but because all the errors of the encoded UMID appear in the redundant 
parity bits, the encoded UMID is still considered to be uncorrectable by the error 
correction decoder. 
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Once a UTV1ID has been recovered, either by error correction or by replacement 
and/or interpolation, then advantageously, the errored bits from the UMID in the 
watermarked image may be replaced. This provides a watermarked image which is 
essentially free from errors so that subsequent processing can utilise the watermark, 
and/or reproduced the watermarked image. 
Other Embodiments 

Returning to Figure 6, the purpose of the analysis processor 360 shown in 
Figure 6 will now be explained. The analysis processor is optionally provided to assist 
the recovery processor 404 in determining whether the data fields of the errored UMID 
should be replaced with the. value of the data fields from the previous UMID-1 or the 
subsequent UMID+1 UMIDs. To this end the analysis processor 360 is arranged to 
compare the content of the watermarked images from which the previous and 
subsequent UMIDs were recovered, as well as the image from which the errored 
UMID was detected and recovered. The analysis processor 360 is arranged to generate 
signals indicative of a comparison between the content of the image from which the 
errored UMID was recovered with the content of the image from which the previous 
and/or subsequent UMIDs UMID-1, UMID+1 were recovered, The comparison can 
be performed by for example generating a histogram of the colour in the images. The 
signals representative of this comparison are fed to the recovery processor via a 
connecting channel 370. 

The recovery processor uses the signals representing the comparison of the 
image content to determine whether the data fields of the errored UMID should 
correspond to the previous UMID-1 or the subsequent UMID+1. Accordingly, for 
example, where the content of the image from which the errored UMID was recovered 
is determined from the comparison to be more similar to the content of the image from 
which the previous UMID was recovered UMID-1, then the data fields of the UMID 
should be replaced with data values derived from the previous UMID. For example, 
the data field which is representative of the clip ID should be replaced with the clip ID 
from the previous UMID- 1 . 
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The Universal Material Identifier (TJMID) 

A brief explanation will now be given of the structure of the UMID, with 
reference to Figure 9A and 9B. The UMID is described in SMPTE Journal March 
2000. Referring to Figures 9A an extended UMID is shown to comprise a first set of 
32 bytes of a basic UMID, shown in Figure 9B and a second set of 32 bytes referred to 
as signature metadata. Thus the first set of 32 bytes of the extended UMID is the basic 
UMID. The components are: 

•A 12-byte Universal Label to identify this as a SMPTE UMID. It defines the 
type of material which the UMID identifies and also defines the methods by which the 
globally unique Material and locally unique Instance numbers are created. 

•A 1-byte length value to define the length of the remaining part of the UMID. 

•A 3-byte Instance number which is used to distinguish between different 
'instances' of material with the same Material number. 

•A 16-byte Material number which is used to identify each clip. Each Material 
number is the same for related instances of the same material. 

The second set of 32 bytes of the signature metadata as a set of packed 
metadata items used to create an extended UMID. The extended UMID comprises the 
basic UMID followed immediately by signature metadata which comprises: 

•An 8-byte time/date code identifying the time and date of the Content Unit 
creation. 

•A 12-byte value which defines the spatial co-ordinates at the time of Content 
Unit creation. 

•3 groups of 4-byte codes which register the country, organisation and user 

codes. 

More explanation of the UMID structure is provided in co-pending UK patent 
application number 0008432.7. 

Various modifications may be made to the embodiments herein before 
described without departing from the scope of the present invention. Although in this 
example embodiment, the data to be embedded is added to the image in the transform 
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domain, in alternative embodiments the data could be represented in the transform 
domain, inverse transformed into the spatial domain, and added to the data in the 
transform domain. 
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CLAIMS 

1 . An apparatus for detecting and recovering data embedded in 
information material, said data comprising a plurality of source data items each having 
been encoded in accordance with a systematic error correction code to produce 
encoded data items each comprising the corresponding source data item and redundant 
data, said encoded data items being embedded in the information material, said 
apparatus comprising 

an embedded data detector operable to detect and generate a recovered version 
of said encoded data from said information material, 

an error processor operable, for each of said recovered encoded data items, to 
determine whether said recovered encoded data item is deemed too errored, and if not, 
decoding said encoded data item to generate a recovered version of said data item, 

a data store for storing said recovered version of said data item, and 

a recovery data processor operable, if said error processor determines that one 
of said recovered encoded data items is deemed too errored, to compare the source 
data item of said encoded data item, with at least one other source data item from said 
data store, and to estimate said source data item of said errored encoded data item in 
dependence upon a corresponding value of said at least one other recovered data item. 

2. An apparatus as claimed in Claim 1, wherein said error processor is 
operable to determine whether each of said recovered encoded data items is errored by 
estimating the number of errored data symbols in each of said recovered encoded data 
items, and to compare said number of errors with a predetermined threshold, said 
recovered encoded data item being determined as errored if said number of errors is 
greater than or equal to said threshold. 

3. An apparatus as claimed in Claims 1 or 2, wherein said recovery 
processor is operable to compare said source data item from said errored encoded data 
item with at least one of a previous and a subsequent decoded and recovered data item, 
and to replace said source data item of said errored encoded data item in accordance 
with at least one of said previous and subsequent source data items. 
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4. An apparatus as claimed in Claim 3, wherein said recovery processor is 
operable, if said previous and said subsequent source data items have the same value to 
replace said source data item of said errored encoded data item with the value of said 
previous or subsequent data items. 

5. An apparatus as claimed in Claim 3, wherein said recovery processor is 
operable, if. said previous and said subsequent source data items have different values 
to replace said source data item of said errored encoded data item with the value 
formed by interpolating between said previous and subsequent data items. 

6. An apparatus as claimed in any preceding Claim, comprising an 
analysis processor operable to compare the content of the information material from 
which a plurality of recovered source data items and said errored encoded data item 
have been detected, and to generate data representative of the comparison, wherein 
said recovery processor is operable to estimate said source data item of said errored 
encoded data item in dependence upon said data representative of said comparison. 

7. An apparatus as claimed in Claims 1, 2 or 3, wherein each of said 
source data items comprises a plurality of data fields, and said recovery processor is 
operable to compare at least one of said data fields of said errored encoded data item 
with the corresponding field of said at least one other recovered data item, and to 
replace said at least one of said fields of said errored encoded data item with the 
corresponding field of said recovered data item in accordance with said comparison. 

8. An apparatus as claimed in Claim 7, wherein said recovery processor is 
operable, in dependence upon at least one of said data fields of said source data item 
being replaced, to determine in combination with said error processor whether said 
recovered encoded data item in which the data field is replaced is deemed to be too 
errored, and if not, decoding said encoded data item to form a recovered version of 
said data item. 
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9. An apparatus as claimed in Claims 7 or 8 ; wherein said recovery 
processor is operable, if said corresponding data field of a previous and a subsequent 
data items have the same value, to set said data field of said errored encoded data item 
to the value of one of said previous and subsequent data items. 

10. An apparatus as claimed in Claims 7 or 8, wherein said recovery 
processor is operable, if said corresponding data field of a previous data item and a 
subsequent data item have different values, to replace said data field of said errored 
encoded data item with a value formed by interpolating between said previous and 
subsequent data items. 

11. An apparatus as claimed in Claims 7 or 8, wherein said recovery 
processor is operable, to determine the difference between said corresponding data 
field of a previous data item and said corresponding data field of a subsequent data 
item, and if said difference is above a predetermined threshold to replace said data 
field of said errored encoded data item which cannot be decoded with the value of said 
field of said previous data item and otherwise to form said replacement value by 
interpolating between said field of said previous and subsequent data items. 

12. An apparatus as claimed in Claims 7 or 8, comprising an analysis 
processor operable to compare the content of the information material from which a 
previous data item, a subsequent data item and said errored encoded data items were 
detected, and to generate data representative of the comparison, wherein said recovery 
processor is operable to replace said data field of said errored encoded data item which 
cannot be decoded with the value of said data field from one of said previous and said 
subsequent data items in dependence upon said comparison data. 

13. An apparatus as claimed in Claims 6 or 12, wherein said analysis 
processor is arranged to estimate the content of the information material from a colour 
histogram or the like. 
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14. An apparatus as claimed in any preceding Claim, wherein said 
information material is at least one of video, audio, data or audio/video material, and 
said source data items include meta data describing the content or attributes relating to 
said video, audio, data or audio/video material. 

15. An apparatus as claimed in Claim 14, wherein said data items include 
Unique Material Identifiers (UMIDs), and said data fields are the fields of said UMID, 
and said encoded data items are encoded UMIDs. 

16. An apparatus as claimed in Claim 14 in combination with Claim 10, 
wherein the data field of an errored encoded UMID, which is recovered by 
interpolating contains data representative of the time code of said UMID. 

17. An apparatus as claimed in Claim 14 in combination with Claim 1 1, 
wherein the data field of an errored encoded UMID, which is recovered by replacing 
the data field with data from the corresponding field of the previous encoded UMID, 
consequent upon a difference between the data fields of the previous and subsequent 
recovered UMIDs being above a predetermined threshold is representative of a clip 
identifier of said UMID . 

18. An apparatus for embedding data into information material, said data 
comprising a plurality of source data items, said apparatus comprising 

an error correction encoder operable to encode each of said data items in 
accordance with a systematic error correction code to produce encoded data items each 
comprising the source data item and redundant data, and 

a combining processor operable to combine said encoded data items with said 
information material. 

19. An apparatus as claimed in Claim 18, wherein said data items include 
meta data such as UMIDs or the like. 
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20. A signal representative of information material in which data have been 
embedded by the apparatus claimed in Claims 18 or 19. 

21. A system for embedding and removing data from information material, 
5 said system comprising 

an apparatus for embedding the data into the information material according to 
Claims 18 or 19, and 

an apparatus for detecting and removing the data from the information material 
according to any of Claims 1 to 17. 

10 

22. A method of detecting and recovering data embedded in information 
material, said data comprising a plurality of source data items each having been 
encoded in accordance with a systematic error correction code to produce encoded 
data items, each encoded data item comprising the corresponding source data item and 

15 redundant data, said encoded data items being embedded in the information material, 
said method comprising 

detecting and generating a recovered version of said encoded data items from 
said information material, 

determining, for each of said encoded data items, whether the recovered 
20 version of said encoded data item is deemed too errored, and 

if not, decoding said encoded data item to generate a recovered version of said 
data item, and storing said recovered version of said data item, and 

if said errored encoded data item is deemed too errored, comparing said source 
data from said errored encoded data item with at least one other source data item from 
25 said data store, and estimating said source data item of said errored encoded data item 
in dependence upon a corresponding value of said other recovered data item. 

23. A method of embedding data in information material, said data 
comprising a plurality of source data items, said method comprising 

30 encoding each of said data items in accordance with a systematic error 

correction code to produce encoded data items each comprising the corresponding said 
source data item and redundant data, and 
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combining said encoded data items with said information material. 

24. A computer program providing computer executable instructions, 
which when loaded on to a data processor configures said data processor to operate as 
an apparatus according to any of Claims 1 to 19. 

25. A computer program having computer executable instructions, which 
when loaded on to a data processor causes the data processor to perform the method 
according to Claims 22 or 23. 

26. A computer program product having a computer readable medium 
having recorded thereon information signals representative of the computer program 
claimed in any of Claims 24 or 25. 

27. An apparatus as herein before described with reference to the 
accompanying drawings. 

28. A method of detecting and recovering data embedded in an image as 
herein before described with reference to the accompanying drawings. 



P/9760GB 



25 



ABSTRACT 

APPARATUS FOR DETECTING AND RECOVERING DATA 

A system for embedding a plurality of data items in and recovering the data 
items from information material includes an apparatus for embedding the data 
comprising an error correction encoder operable to encode each of the data items in 
accordance with a systematic error correction code to produce encoded data items, 
each comprising the corresponding source data item and redundant data, and a 
combining processor operable to combine the encoded data items with the information 
material. 

The system further comprises an apparatus for detecting and recovering the 
embedded data from the information material comprises an embedded data detector 
operable to detect and generate a recovered version of the error correction encoded 
data from the information material, an error processor operable, for each of the 
recovered encoded data items, to determine whether the recovered encoded data item 
is deemed too errored, and if not, decoding the encoded data item to generate a 
recovered version of the data item, a data store for storing the recovered version of the 
data item, and a recovery data processor operable, if the error processor determines 
that one of the recovered encoded data items is errored, to compare the source data 
item of the errored encoded data item, with at least one other source data item from the 
data store, and to estimate the source data item of the errored encoded data item 
consequent upon a corresponding value of the other recovered data item. 



[Fig 7] 



tAWlD 



■s4 



su-kb> J 



^3 




i i i i « / 



■0/ C ^ py^ yps 



(AniA o 



